15 research outputs found

    The Superiority of the Ensemble Classification Methods: A Comprehensive Review

    Get PDF
    The modern technologies, which are characterized by cyber-physical systems and internet of things expose organizations to big data, which in turn can be processed to derive actionable knowledge. Machine learning techniques have vastly been employed in both supervised and unsupervised environments in an effort to develop systems that are capable of making feasible decisions in light of past data. In order to enhance the accuracy of supervised learning algorithms, various classification-based ensemble methods have been developed. Herein, we review the superiority exhibited by ensemble learning algorithms based on the past that has been carried out over the years. Moreover, we proceed to compare and discuss the common classification-based ensemble methods, with an emphasis on the boosting and bagging ensemble-learning models. We conclude by out setting the superiority of the ensemble learning models over individual base learners. Keywords: Ensemble, supervised learning, Ensemble model, AdaBoost, Bagging, Randomization, Boosting, Strong learner, Weak learner, classifier fusion, classifier selection, Classifier combination. DOI: 10.7176/JIEA/9-5-05 Publication date: August 31st 2019

    An Elliptic curve digital signature algorithm (ECDSA) for securing data : an exemplar of securing patient's data

    Get PDF
    The conference aimed at supporting and stimulating active productive research set to strengthen the technical foundations of engineers and scientists in the continent, through developing strong technical foundations and skills, leading to new small to medium enterprises within the African sub-continent. It also seeked to encourage the emergence of functionally skilled technocrats within the continent.In this paper, we present the progress of our work in the creation and implementation of an Elliptic Curve Digital Signature Algorithm (ECDSA). We present the design of the algorithm and its implementation in encryption of medical data. ECDSA PHP ECC code has been used to implement the digital signatures over elliptic curve P-256. The work presented highlights practical implementation of ECDSA signature generation to secure and authenticate patient laboratory test results in a Laboratory Information System (LIS). Future work will demonstrate the implementation of decryption using the ECDSA. With the inherent superiority capability of Elliptic Curves (EC) in securing data, our algorithm is highly secure and can be adapted in many areas where data privacy and security is paramount.Strathmore University; Institute of Electrical and Electronics Engineers (IEEE

    Using Ensemble Technique to Improve Multiclass Classification

    Get PDF
    Many real world applications inevitably contain datasets that have multiclass structure characterized by imbalance classes, redundant and irrelevant features that degrade performance of classifiers. Minority classes in the datasets are treated as outliers’ classes. The research aimed at establishing the role of ensemble technique in improving performance of multiclass classification. Multiclass datasets were transformed to binary and the datasets resampled using Synthetic minority oversampling technique (SMOTE) algorithm.  Relevant features of the datasets were selected by use of an ensemble filter method developed using Correlation, Information Gain, Gain-Ratio and ReliefF filter selection methods. Adaboost and Random subspace learning algorithms were combined using Voting methodology utilizing random forest as the base classifier. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naïve bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established that ensemble technique, resampling datasets and decomposing multiclass results in an improved classification performance as well as enhanced detection of minority outlier (rare) classes. Keywords: Multiclass, Classification, Outliers, Ensemble, Learning Algorithm DOI: 10.7176/JIEA/9-5-04 Publication date: August 31st 201

    An Ensemble Model for Multiclass Classification and Outlier Detection Method in Data Mining

    Get PDF
    Real life world datasets exhibit a multiclass classification structure characterized by imbalance classes. Minority classes are treated as outliers’ classes. The study used cross-industry process for data mining methodology. A heterogeneous multiclass ensemble was developed by combining several strategies and ensemble techniques. The datasets used were drawn from UCI machine learning repository. Experiments for validating the model were conducted and represented in form of tables and figures. An ensemble filter selection method was developed and used for preprocessing datasets. Point-outliers were filtered using Inter quartile range filter algorithm. Datasets were resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Multiclass datasets were transformed to binary classes using OnevsOne decomposing technique. An Ensemble model was developed using adaboost and random subspace algorithms utilizing random forest as the base classifier. The classifiers built were combined using voting methodology. The model was validated with classification and outlier metric performance measures such as Recall, Precision, F-measure and AUCROC values. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naïve bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established ensemble techniques, resampling datasets and decomposing multiclass results in an improved detection of minority outlier (rare) classes. Keywords: Multiclass, Outlier, Ensemble, Model, Classification DOI: 10.7176/JIEA/9-2-04 Publication date: April 30th 2019

    Fuzzy Contrast Improvement for Low Altitude Aerial Images

    Get PDF
    International audiencePrecision agriculture is becoming very important in improving food security. Unmanned Aerial Vehicles (UAVs) have higher possibilities in this way, improving real time data gathered with aerial sensors. Fuzzy techniques have proved to be highly effective in managing vagueness and ambiguity. The unmanned helicopters are highly valuable due to the level of maneuverability that they possess. We believe that many different degrees of autonomy and functionalities of UAVs will be useful in agriculture. We present a new process to extract data from aerial images that comes from low altitude UAVs. We combined NDVI algorithm output with the RSWHE-M method on grey scaled images. Primary results show that our method extracts images that are visually acceptable to human eye and have a natural appearance

    Automated Detection of Cervical Pre-Cancerous Lesions Using Regional-Based Convolutional Neural Network

    Get PDF
    The Cervical Colposcopy image is an image of woman’s cervix taken with a digital colposcope after application of acetic acid. The captured cervical images must be understood for diagnosis, prognosis and treatment planning of the anomalies. This Cervix image understanding is generally performed by skilled medical professionals. However, the scarcity of human medical experts and the fatigue and rough estimate procedures involved with them limit the effectiveness of image understanding performed by skilled medical professionals. This paper, the model uses Regional Based Convolutional Neural Network (R-CNN) to effectively visualize of pre-cancerous lesions and to aid in diagnosis of the disease. The model was trained, on a dataset comprising of 10,383 cervical images samples. The datasets were derived from public dataset repositories. The training samples comprised of type class 1, 2 and 3 traits of cervical precancerous traits. The performance was evaluated using K-nearest -neighbor model over R-CNN. With an accuracy rate of 86%, this approach heralds a promising development in the detection of cervical precancerous lesions. This study findings established that the proposed model in provision of the better accuracy and misclassifications performance than various testing algorithms

    Feature Based Data Anonymization for High Dimensional Data

    Get PDF
    Information surges and advances in machine learning tools have enable the collection and storage of large amounts of data. These data are highly dimensional.  Individuals are deeply concerned about the consequences of sharing and publishing these data as it may contain their personal information and may compromise their privacy. Anonymization techniques have been used widely to protect sensitive information in published datasets. However, the anonymization of high dimensional data while balancing between privacy and utility is a challenge. In this paper we use feature selection with information gain and ranking to demonstrate that the challenge of high dimensionality in data can be addressed by anonymizing attributes with more irrelevant features. We conduct experiments with real life datasets and build classifiers with the anonymized datasets. Our results show that by combining feature selection with slicing and reducing the amount of data distortion for features with high relevance in a dataset, the utility of anonymized dataset can be enhanced. Keywords: High Dimension, Privacy, Anonymization, Feature Selection, Classifier, Utility DOI: 10.7176/JIEA/9-2-03 Publication date: April 30th 201

    A Hybrid Ensemble Method for Multiclass Classification and Outlier Detection

    Get PDF
    Multiclass problem has continued to be an active research area due to the challenges paused by the issue of imbalance datasets and lack of a unifying classification algorithms. Real world problems are of multiclass nature with skewed representations. The study focused on the challenges of multiclass classification. Multiclass datasets were adopted from UCI machine learning repository. The research developed a heterogeneous ensemble model for multiclass classification and outlier detection that combined several strategies and ensemble techniques. Preprocessing involved filtering global outliers and resampling datasets using synthetic minority oversampling technique (SMOTE) algorithm. Datasets binarization was done using OnevsOne decomposing technique. Heterogeneous ensemble model was constructed using adaboost, random subspace algorithms and random forest as the base classifier. The classifiers built were combined using average of probabilities voting rule and evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other commonly used classical algorithms. The study findings established proper preprocessing and decomposing multiclass results in an improved performance of minority outlier classes while safe guarding integrity of the majority classes

    Motifs de la logique floue pour l'analyse des sentiments et en imagerie

    No full text
    La logique floue est aujourd'hui universellement admise comme discipline ayant fait ses preuves à l'intersection des mathématiques, de l'informatique, des sciences cognitives et de l'Intelligence Artificielle. En termes formels, la logique floue est une extension de la logique classique ayant pour but de mesurer la flexibilité du raisonnement humain, et permettant la modélisation des imperfections des données, en particulier, les deux imperfections les plus fréquentes : l'imprécision et l'incertitude. En outre, la logique floue ignore le principe du tiers exclu et celui de non-contradiction.Nous n'allons pas, dans ce court résumé de la thèse, reprendre et définir tous les concepts de cet outil devenu désormais classique : fonction d'appartenance, degré d'appartenance, variable linguistique, opérateurs flous, fuzzyfication, défuzzication, raisonnement approximatif … L'un des concepts de base de cette logique est la notion de possibilité qui permet de modéliser la fonction d'appartenance d'un concept. La possibilité d'un événement diffère de sa probabilité dans la mesure où elle n'est pas intimement liée à celle de l'événement contraire. Ainsi, par exemple, si la probabilité qu'il pleuve demain est de 0,6, alors la probabilité qu'il ne pleuve pas doit être égale à 0,4 tandis que les possibilités qu'il pleuve demain ou non peuvent toutes les deux être égales à 1 (ou encore deux autres valeurs dont la somme peut dépasser 1).Dans le domaine de l'informatique, l'apprentissage non supervisé (ou « clustering ») est une méthode d'apprentissage automatique quasi-autonome. Il s'agit pour un algorithme de diviser un groupe de données, en sous-groupes de manière que les données considérées comme les plus similaires soient associées au sein d'un groupe homogène. Une fois que l'algorithme propose ces différents regroupements, le rôle de l'expert ou du groupe d'experts est alors de nommer chaque groupe, éventuellement diviser certains ou de regrouper certains, afin de créer des classes. Les classes deviennent réelles une fois que l'algorithme a fonctionné et que l'expert les a nommées.Encore une fois, notre travail, comme tous les travaux du domaine, vise à adapter les modèles traditionnelles d'apprentissage et/ou de raisonnement à l'imprécision du monde réel. L'analyse des sentiments à partir de ressources textuelles et les images dans le cadre de l'agriculture de précision nous ont permis d'illustrer nos hypothèses. L'introduction par le biais de notre travail du concept de motifs flous est sans aucun doute une contribution majeure.Ce travail a donné lieu à trois contributions majeures :Standard (type-1) fuzzy sets were introduced to mimic human reasoning in its use of approximate information and uncertainty to generate decisions. Since knowledge can be expressed in a natural way by using fuzzy sets, many decision problems can be greatly simpli_ed. However, standard type-1 fuzzy sets have limitations when it comes to modelinghuman decision making.When Zadeh introduced the idea of higher types of fuzzy sets called type-n fuzzy sets andtype-2 fuzzy sets, the objective was to solve problems associated with modeling uncertainty using crisp membership functions of type-1 fuzzy sets. The extra dimension presented by type-2 fuzzy sets provides more design freedom and exibility than type-1 fuzzy sets. The ability of FLS to be hybridized with other methods extended the usage of Fuzzy LogicSystems (FLS) in many application domains. In architecture and software engineering the concept of patterns was introduced as a way of creating general repeatable solutions to commonly occurring problems in the respective_elds. In software engineering for example, the design pattern is not a _nished design that can be transformed directly into code. It is a description or template on how to solve a problem that can be used in many di_erent situations. This thesis introduces the novel concept of fuzzy patterns in T2 FLS. Micro-blogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like TwitterTM much information reecting people's opinions and attitudes is published and shared among users on a daily basis. This has brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues. Thisresearch demonstrates the importance of the neutral category in sentiment polarity analysis, it then introduces the concept of fuzzy patterns in sentiment polarity analysis. The xvii Interval Type-2 Fuzzy Set (IT2 FS), were proposed by reference [Men07c] to model words. This is because it is characterized by its Footprint Of Uncertainty (FOU). The FOU providesa potential to capture word uncertainties. The use of IT2 FS in polarity sentiment classi_cation is demonstrated. The importance of the neutral category is demonstrated in both supervised and unsupervised learning methods. In the _nal section the concept of fuzzy patterns in contras

    TOWARDS UNIVERSAL RATING OF ONLINE MULTIMEDIA CONTENT

    Get PDF
    Most website classification systems have dealt with the question of classifying websites based on their content, design, usability, layout and such, few have considered website classification based on users ’ experience. The growth of online marketing and advertisement has lead to fierce competition that has resulted in some websites using disguise ways so as to attract users. This may result in cases where a user visits a website and does not get the promised results. The results are a waste of time, energy and sometimes even money for users. In this context, we design an experiment that uses fuzzy linguistic model and data mining techniques to capture users’ experiences, we then use the k-means clustering algorithm to cluster websites based on a set of feature vectors from the users ’ perspective. The content unity is defined as the distance between the real content and its keywords. We demonstrate the use of bisecting k-means algorithm for this task and demonstrate that the method can incrementally learn from user’s profile on their experience with these websites
    corecore